73 research outputs found
Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques
specifically developed for analyzing and understanding the inner-workings and
representations acquired by neural models of language. Approaches included:
systematic manipulation of input to neural networks and investigating the
impact on their performance, testing whether interpretable knowledge can be
decoded from intermediate representations acquired by neural networks,
proposing modifications to neural network architectures to make their knowledge
state or generated output more explainable, and examining the performance of
networks on simplified or formal languages. Here we review a number of
representative studies in each category
Phonological (un)certainty weights lexical activation
Spoken word recognition involves at least two basic computations. First is
matching acoustic input to phonological categories (e.g. /b/, /p/, /d/). Second
is activating words consistent with those phonological categories. Here we test
the hypothesis that the listener's probability distribution over lexical items
is weighted by the outcome of both computations: uncertainty about phonological
discretisation and the frequency of the selected word(s). To test this, we
record neural responses in auditory cortex using magnetoencephalography, and
model this activity as a function of the size and relative activation of
lexical candidates. Our findings indicate that towards the beginning of a word,
the processing system indeed weights lexical candidates by both phonological
certainty and lexical frequency; however, later into the word, activation is
weighted by frequency alone.Comment: 6 pages, 4 figures, accepted at: Cognitive Modeling and Computational
Linguistics (CMCL) 201
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Deep architectures such as Transformers are sometimes criticized for having
uninterpretable "black-box" representations. We use causal intervention
analysis to show that, in fact, some linguistic features are represented in a
linear, interpretable format. Specifically, we show that BERT's ability to
conjugate verbs relies on a linear encoding of subject number that can be
manipulated with predictable effects on conjugation accuracy. This encoding is
found in the subject position at the first layer and the verb position at the
last layer, but distributed across positions at middle layers, particularly
when there are multiple cues to subject number.Comment: To appear in Findings of the Association for Computational
Linguistics: EMNLP 202
When a sentence does not introduce a discourse entity, Transformer-based models still sometimes refer to it
Understanding longer narratives or participating in conversations requires
tracking of discourse entities that have been mentioned. Indefinite noun
phrases (NPs), such as 'a dog', frequently introduce discourse entities but
this behavior is modulated by sentential operators such as negation. For
example, 'a dog' in 'Arthur doesn't own a dog' does not introduce a discourse
entity due to the presence of negation. In this work, we adapt the
psycholinguistic assessment of language models paradigm to higher-level
linguistic phenomena and introduce an English evaluation suite that targets the
knowledge of the interactions between sentential operators and indefinite NPs.
We use this evaluation suite for a fine-grained investigation of the entity
tracking abilities of the Transformer-based models GPT-2 and GPT-3. We find
that while the models are to a certain extent sensitive to the interactions we
investigate, they are all challenged by the presence of multiple NPs and their
behavior is not systematic, which suggests that even models at the scale of
GPT-3 do not fully acquire basic entity tracking abilities.Comment: To appear at NAACL 202
- …